Add FFT support via AbstractFFTs interface#713
Open
KaanKesginLW wants to merge 43 commits into
Open
Conversation
Open
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #713 +/- ##
========================================
Coverage 80.77% 80.78%
========================================
Files 61 63 +2
Lines 2866 3008 +142
========================================
+ Hits 2315 2430 +115
- Misses 551 578 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: d63bf16 | Previous: 706b87f | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
1128208.5 ns |
1111917 ns |
1.01 |
array/accumulate/Float32/dims=1 |
1553459 ns |
1537333 ns |
1.01 |
array/accumulate/Float32/dims=1L |
9811959 ns |
9801916.5 ns |
1.00 |
array/accumulate/Float32/dims=2 |
1861625 ns |
1836250 ns |
1.01 |
array/accumulate/Float32/dims=2L |
7211937.5 ns |
7183625 ns |
1.00 |
array/accumulate/Int64/1d |
1252666 ns |
1241334 ns |
1.01 |
array/accumulate/Int64/dims=1 |
1812667 ns |
1812541 ns |
1.00 |
array/accumulate/Int64/dims=1L |
11791646 ns |
11645625 ns |
1.01 |
array/accumulate/Int64/dims=2 |
2152708 ns |
2147520.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
9746042 ns |
9754042 ns |
1.00 |
array/broadcast |
1017500 ns |
1010999.5 ns |
1.01 |
array/construct |
5625 ns |
5708 ns |
0.99 |
array/permutedims/2d |
1152354.5 ns |
1148187.5 ns |
1.00 |
array/permutedims/3d |
1656583 ns |
1656083 ns |
1.00 |
array/permutedims/4d |
2628000 ns |
2633729 ns |
1.00 |
array/private/copy |
552833 ns |
543083 ns |
1.02 |
array/private/copyto!/cpu_to_gpu |
783084 ns |
780167 ns |
1.00 |
array/private/copyto!/gpu_to_cpu |
784042 ns |
778333 ns |
1.01 |
array/private/copyto!/gpu_to_gpu |
609375 ns |
611292 ns |
1.00 |
array/private/iteration/findall/bool |
1429958 ns |
1320854 ns |
1.08 |
array/private/iteration/findall/int |
1597208 ns |
1546271 ns |
1.03 |
array/private/iteration/findfirst/bool |
1972875 ns |
1983083 ns |
0.99 |
array/private/iteration/findfirst/int |
2033083 ns |
2003667 ns |
1.01 |
array/private/iteration/findmin/1d |
2247834 ns |
2254458 ns |
1.00 |
array/private/iteration/findmin/2d |
2018270.5 ns |
2007458 ns |
1.01 |
array/private/iteration/logical |
2519167 ns |
2527292 ns |
1.00 |
array/private/iteration/scalar |
5097708 ns |
5570229.5 ns |
0.92 |
array/random/rand/Float32 |
1165709 ns |
1109542 ns |
1.05 |
array/random/rand/Int64 |
1313458 ns |
1291000 ns |
1.02 |
array/random/rand!/Float32 |
967208 ns |
940625 ns |
1.03 |
array/random/rand!/Int64 |
895292 ns |
877666.5 ns |
1.02 |
array/random/randn/Float32 |
1072021 ns |
1067875 ns |
1.00 |
array/random/randn!/Float32 |
842104 ns |
835541.5 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
1114666 ns |
1124229.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
827437.5 ns |
832167 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
1336417 ns |
1340166.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
843625 ns |
851250 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
1772458 ns |
1763500 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
1544104 ns |
1551104 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
1130041 ns |
1126666 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
2033749.5 ns |
2021125 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2 |
1354583 ns |
1269083 ns |
1.07 |
array/reductions/mapreduce/Int64/dims=2L |
3581208.5 ns |
3587375 ns |
1.00 |
array/reductions/reduce/Float32/1d |
1023291.5 ns |
1048583.5 ns |
0.98 |
array/reductions/reduce/Float32/dims=1 |
828167 ns |
835563 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
1346125 ns |
1342834 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
864750 ns |
854709 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
1769458 ns |
1793437.5 ns |
0.99 |
array/reductions/reduce/Int64/1d |
1517666 ns |
1509583 ns |
1.01 |
array/reductions/reduce/Int64/dims=1 |
1116584 ns |
1109041 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
2013000 ns |
2022959 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
1158979.5 ns |
1149458 ns |
1.01 |
array/reductions/reduce/Int64/dims=2L |
4162375 ns |
4176771 ns |
1.00 |
array/shared/copy |
241125 ns |
239875 ns |
1.01 |
array/shared/copyto!/cpu_to_gpu |
75812.5 ns |
78959 ns |
0.96 |
array/shared/copyto!/gpu_to_cpu |
80625 ns |
80041 ns |
1.01 |
array/shared/copyto!/gpu_to_gpu |
80000 ns |
80500 ns |
0.99 |
array/shared/iteration/findall/bool |
1447084 ns |
1443687.5 ns |
1.00 |
array/shared/iteration/findall/int |
1541604.5 ns |
1597292 ns |
0.97 |
array/shared/iteration/findfirst/bool |
1563542 ns |
1560312.5 ns |
1.00 |
array/shared/iteration/findfirst/int |
1579896 ns |
1587125 ns |
1.00 |
array/shared/iteration/findmin/1d |
1845562.5 ns |
1847104 ns |
1.00 |
array/shared/iteration/findmin/2d |
2020209 ns |
2008542 ns |
1.01 |
array/shared/iteration/logical |
2409604.5 ns |
2411709 ns |
1.00 |
array/shared/iteration/scalar |
187125 ns |
186000 ns |
1.01 |
integration/byval/reference |
1567334 ns |
1561833 ns |
1.00 |
integration/byval/slices=1 |
1555208 ns |
1557292 ns |
1.00 |
integration/byval/slices=2 |
2611333 ns |
2611166.5 ns |
1.00 |
integration/byval/slices=3 |
8553917 ns |
7720208.5 ns |
1.11 |
integration/metaldevrt |
870958 ns |
861375 ns |
1.01 |
kernel/indexing |
624875 ns |
637542 ns |
0.98 |
kernel/indexing_checked |
614208.5 ns |
659708 ns |
0.93 |
kernel/launch |
11375 ns |
11333 ns |
1.00 |
kernel/rand |
586458 ns |
579167 ns |
1.01 |
latency/import |
1380589000 ns |
1375283937.5 ns |
1.00 |
latency/precompile |
28998263750 ns |
28780620500 ns |
1.01 |
latency/ttfp |
1647912583 ns |
1640608042 ns |
1.00 |
metal/synchronization/context |
19541 ns |
19084 ns |
1.02 |
metal/synchronization/stream |
18208 ns |
17917 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as spam.
This comment was marked as spam.
40871e3 to
b88d77f
Compare
This comment was marked as resolved.
This comment was marked as resolved.
130ed6a to
e3aeeea
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
e8d6b2c to
ffdffe8
Compare
ffdffe8 to
6f6aa3c
Compare
3c46dc2 to
60077f5
Compare
60077f5 to
25108fd
Compare
25108fd to
57a51b7
Compare
aplavin
reviewed
Mar 23, 2026
57a51b7 to
e0db641
Compare
e0db641 to
c115f38
Compare
They can be added in a different PR
Claude was instructed to implement this like the MPSGraphs matmul caching
3d73d20 to
d63bf16
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds FFT support for
MtlArrayvia the AbstractFFTs.jl interface.HEAVILY based on CUDA.jl's AbstractFFTs.jl interface implementation using MPSGraph functionality.
Performance
Benchmarked on Apple M2 Max with 30-core GPU against FFTW.jl on CPU:
Example Usage
Close #270